-
Notifications
You must be signed in to change notification settings - Fork 301
document data #2049
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
document data #2049
Conversation
This isn't a full review, but I came by quickly to see if this pkgdown failure is related to one seen in a different PR. It doesn't seem to be. The pkgdown CI failure appears to be real because the new functions need to be added to the reference index. So that would be good to fix up to make this PR complete. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
As part of Tidyverse Dev Day, I am helping @jennybc move this PR forward.
Having spoken with Jenny, we agree this is an important capability to bring into usethis and that this PR is greatly appreciated.
We also realize that this PR has been around for a while, and the widespread adoption
of LLM (e.g. Positron) is changing a lot of workflows. For example, we might imagine providing a skeleton
containing material known to be true, which could provide an AI assistant with the opportunity to
fill in the blanks on more-subjective things, like the description.
As such, my approach to this review is to focus on a minimal set of things we can merge to move the ball forward.
It's difficult to add a function to be exported, then to remove it in case we want to do things differently.
So I think it could be useful to focus on merging the internal functions, using that as a starting point for further work.
Would you consider, for the purposes of this PR, removing the use_document_data()
function, so that work can continue on
create_data_description()
, generate_data_info()
, and generate_variable_descriptions()
?
As well. would you consider contributing some tests for generate_data_info()
, and generate_variable_descriptions()
?
Thanks again for taking a swing at this!
data_info <- generate_data_info(dataset) | ||
variable_descriptions <- generate_variable_descriptions(dataset) | ||
|
||
description_template <- paste0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
In time, this could be done using template file and the use_template()
function. I would not worry about this now, but it would not surprise me if this were used, once things become more settled.
} | ||
|
||
generate_data_info <- function(dataset) { | ||
paste0( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Within usethis, newer functions use glue()
, rather than paste0()
, would you consider refactoring?
|
||
generate_variable_descriptions <- function(dataset) { | ||
purrr::map_chr(names(dataset), function(var) { | ||
paste0("\\item{", var, "} {", class(dataset[[var]]), ": Type label here}") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same here for glue()
, vs. paste0()
Potential fix for #1681. Here’s a summary of the relevant points discussed in the issue:
Existing Tools: The sinew package provides functions for documenting, but its workflow might overlap with
usethis
. More details can be found in the workflow guide.Alternative Solution: Another approach is proposed here, which handles various R objects and offers customizable documentation. However, it may require style translation and simplification to make it more accessible.
This Pull Request: The
use_document_data
function is simpler and easier to read, though it doesn’t offer the same level of customization as the alternative.